• Introduction
    • What is a code violation?
    • Why housing code violations?
  • Use Cases
    • Case 1: Protecting Tenants and Residents
    • Case 2: Cold, Hard Cash
  • Predicting Violations
    • Actually… Predicting Parcels
    • By the Numbers
    • Types of Violations
  • The Predictors
    • Time
    • Property Attributes
    • Others & Next Steps
  • Caveats
    • Inherent Bias
library(tidyverse)
library(sf)
library(leaflet)

osm_bb <- osmdata::getbb("Syracuse,NY")

leaf <- function(data,options = leafletOptions()) {
  leaflet(data,options = options) %>% 
  addProviderTiles(providers$Stamen.TonerBackground) %>% 
  fitBounds(osm_bb["x","min"],osm_bb["y","min"],
            osm_bb["x","max"],osm_bb["y","max"]) %>% 
  return
}

Introduction

What is a code violation?

Cities produce and maintain building codes in order to ensure that buildings and facilities are safe for residents, employees, customers, and all others who may interact with the space. Committing a code violation implies that the property did not adhere to at least one of the required standards, making the property more likely to cause harm or health issues to tenants.

Although there are hundreds of building codes, and dozens of national standards, almost all overlapping. For this study, we’ll have to filter down and re-code types of violations into manageable categories, but we’ll get to that later.

Why housing code violations?

Currently, the process for inspecting buildings and cracking down on code violators is a reactive process, rather than proactive. Most inspections to-date arise either because someone files a complaint, or an inspector visibly notices an issue when driving past a property. We aim to make this reactive process into a proactive one by helping inspectors prioritize parcels and neighborhoods in which to inspect buildings.

Syracuse is in the process of distributing inspection practices across 22 districts, and our findings can supplement this transition and direct investment at this aggregated level.

doce_sf <- read_sf("../data/DOCE_Catchments")

doce <- leaf(doce_sf) %>%
  addPolygons(weight = 2,
              popup = ~FID_Export)
Leaflet | Map tiles by Stamen Design, CC BY 3.0 — Map data © OpenStreetMap

Using data available from the City of Syracuse, including building, neighborhood, and other characteristics, we are developing an algorithm to address decision-making by predicting code violators. Doing so will save inspectors time, and in the process, save money and resources.

Use Cases

Case 1: Protecting Tenants and Residents

We believe that there are two different uses for this algorithm. The first, and more noble use case, is to protect tenants and residents by ensuring that civilians are safe during particular times of vulnerability. For example, prior to an impending winter storm, the City of Syracuse can send out a mailer to all residents whose living units may not have functional heating or insulation systems. If the city plans on changing water service, they could populate a mailing list and reach out to those residents who may be at risk for lead exposure. If construction is taking place nearby homes and office buildings with asbestos risk, then communication can take place to ensure nearby civilians can take proper precaution.

Case 2: Cold, Hard Cash

Another use case for this algorithm would be for the City of Syracuse to effectively generate revenue by accurately targeting residential and commercial buildings which may be most likely to not be upholding all code standards. The city has recently employed 23 inspectors, each with their respective region to manage. Given this investment, the City surely wants to ensure that the increased investment toward code inspection crackdowns is a positive use of taxpayer dollars, and that the increased focus is leading to greater resolve. Whereas the majority of code violations to-date have been reported via call-in, the hiring of inspectors implies that the city would like to be more rigorous with assessing the code standards of their buildings. Using an algorithm to effectively pinpoint code violators will help the City generate greater revenue with fewer resources.

Predicting Violations

Actually… Predicting Parcels

The City of Syracuse is organized into 55 census tracts and 28 zip codes. Additionally, the city is being subdivided into 22 inspector zones, per below. Even though these districts can direct high-level investment, they do not predict exactly where violations are going to happen.

As a lot of predictive features are parcel-specific, our model hopes to predict whether individual properties will violate or not. The dependent variable is binary: 1 for parcels that are expected to violate, and 0 for those that are not.

pal_pct <- colorNumeric(palette = "YlGnBu",
                        domain = doce_sf$viol_pct)
lf_pct <- leaf(doce_sf) %>% 
  addPolygons(stroke = FALSE,
              fillColor = ~pal_pct(viol_pct),
              fillOpacity = 0.8) %>%
  addLegend("bottomright", 
            pal = pal_pct, 
            values = ~viol_pct)

pal_parc <- colorNumeric(palette = "YlGnBu",
                         na.color = "#081d58",
                         domain = data_sf$viol_here)
parc <- leaf(data_sf %>% head(12000),
             options = leafletOptions(minZoom = 15)) %>%
  addPolygons(weight = 1, color = "gray",
              fillColor = ~pal_parc(viol_ct),
              fillOpacity = 0.8) %>%
  addLegend("bottomright", 
            pal = pal_parc, 
            values = ~viol_here)
Percent of
parcels with
a violation
15%20%25%30%35%40%

Leaflet | Map tiles by Stamen Design, CC BY 3.0 — Map data © OpenStreetMap
Parcels with
at least one
violation
0.00.20.40.60.81.0

Leaflet | Map tiles by Stamen Design, CC BY 3.0 — Map data © OpenStreetMap

By the Numbers

load("../data/viol_crosswalk.RData")

violations <- readxl::read_excel("../data/UPenn MUSA/Violations.xlsx") %>%
  select(-zone_type_description,
         Status = status_type_name) %>% 
  distinct %>%
  left_join(imc_crosswalk) %>%
  rename(`Violation Type` = codebook_name)
ct <- violations %>% 
  filter(Status != "void") %>% 
  nrow()
yr <- violations %>% 
  pull(violation_date) %>%
  lubridate::ymd_hms() %>%
  lubridate::year() %>% min

violations %>%
  group_by(Status) %>%
  summarize(Count = n()) %>%
  ggplot(aes(x = "",
             y = Count,
             fill = Status,
             label = Count)) +
  geom_bar(stat = "identity") +
  coord_polar("y", start = 0) +
  geom_text(aes(label = Count),
            color = "black",
            position = position_stack(vjust = 0.7)) +
  scale_fill_manual(values = c("lightblue","yellowgreen","gray")) + 
  labs(x = NULL, y = NULL,
       title = paste0(ct," non-void violations since ",yr)) +
  theme_minimal() +
  theme(panel.grid = element_blank(),
        axis.line = element_blank(),
        axis.text = element_blank(),
        axis.ticks = element_blank())

Types of Violations

violations %>% 
  group_by(`Violation Type`) %>%
  summarize(Count = n()) %>%
  head(20)
Violation Type Count
SPCC - Section 27-72 (e) -Trash & Debris 4033
SPCC - Section 27-72 (f) - Overgrowth 3649
2010 IMC - Section 305.3 - Interior surfaces 2755
2010 IMC - Section 504.1 - General 1571
2010 IMC - Section 308.1 - Infestation 1458
SPCC - Section 27-32 (d) Protective coating for wood surfaces 1039
2010 IMC - Section 305.1 - General 999
2010 IMC - Section 603.1 - Mechanical appliances 976
2010 IMC - Section 304.13 - Window, skylight and door frames 974
SPCC - Section 27-57 (a) (19) - Switch/Outlet is Damaged/ Unserviceable 953
2010 IMC - Section 704.2 - Smoke alarms 942
2015 IMPC - 305.3 - Interior Surfaces 935
2010 IMC - Section 304.15 - Doors 885
SPCC - Section 27-31 (c) Structural members 835
2010 IMC - Section 107.1.3 - Premises Unfit for Human Occupancy 789

The Predictors

Time

viol_month <- violations %>%
  mutate(Month    = lubridate::ymd_hms(violation_date) %>% lubridate::month(label = TRUE),
         Year     = lubridate::ymd_hms(violation_date) %>% lubridate::year(),
         Category = NA,
         Category = ifelse(health_violation,"Health",Category),
         Category = ifelse(safety_violation,"Safety",Category)) %>%
  select(Year,Month,Category) %>% drop_na %>%
  group_by(Month,Category) %>%
  summarize(Count = n())

ggplot(viol_month,
       aes(x = Month, 
           color = Category, 
           fill = Category,
           y = Count, 
           group = Category)) +
  geom_line(size = 2) +
  geom_area(alpha = 0.25, 
            position = "identity") + 
  theme_plot()

viol_year <- violations %>%
  mutate(Month    = lubridate::ymd_hms(violation_date) %>% lubridate::month(label = TRUE),
         Year     = lubridate::ymd_hms(violation_date) %>% lubridate::year(),
         Category = NA,
         Category = ifelse(health_violation,"Health",Category),
         Category = ifelse(safety_violation,"Safety",Category)) %>%
  select(Year,Month,Category) %>% drop_na %>%
  group_by(Year,Month,Category) %>%
  summarize(Count = n())

ggplot(viol_year,
       aes(x = Month, 
           color = Category, 
           fill = Category,
           y = Count, 
           group = Category)) +
  geom_line(size = 2) +
  geom_area(alpha = 0.25, 
            position = "identity") + 
  theme_plot() +
  facet_wrap(~Year)

Property Attributes

Others & Next Steps

pal_pct <- colorNumeric(palette = "YlGnBu",
                        domain = doce_sf$viol_pct)
lf_pct <- leaf(doce_sf) %>% 
  addPolygons(stroke = FALSE,
              fillColor = ~pal_pct(viol_pct),
              fillOpacity = 0.8) %>%
  addLegend("bottomright", 
            pal = pal_pct, 
            values = ~viol_pct)
Percent of
parcels with
a violation
15%20%25%30%35%40%

Leaflet | Map tiles by Stamen Design, CC BY 3.0 — Map data © OpenStreetMap

Caveats

Inherent Bias

To-date, violations have been reported via call-in or via cursory external inspection by an inspector driving by. This means that the violations reported have primarily focused on violations that may be apparent superficially, which will bias the kinds of violations in the training set. It is likely that several other kinds of violations may not have been included in the data to-date, given the kinds of inspections taken place to-date (external only). Moreover, our algorithm is to be trained using data from the City of Syracuse, whose types of code violations and frequency of various kinds of violations may not be completely representative of the distribution of violations taking place in other cities.